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Abstract 

This article is concerned with simultaneous tests on linear regression coefficients in 
high-dimensional settings. When the dimensionality is larger than the sample size, the 
classic T-test is not applicable since the sample covariance matrix is not invertible. In 
order to overcome this issue, both Goeman, Finos and van Houwelingen (2011) and 
Zhong and Chen (2011) proposed their test procedures after excluding the (X X)“^ 
term in T-statistics. However, both these two test are not invariant under the group of 
scalar transformations. In order to treat those variables in a ‘fair’ way, we proposed a 
new test statistic and establish its asymptotically normal under certain mild conditions. 
Simulation studies showed that our test procedure performs very well in many cases. 

Keywords: Asymptotic normality; High-dimensional data; Large p, small n; U- 
statistics; Scale-invariant. 


1 Introduction 


In the past decades, high-dimensional data are increasingly encountered in statistical appli¬ 
cation from many areas, such as hyperspectral imagery, internet portals, microarray anal¬ 
ysis and hnance. A frequently encountered challenge in high-dimensional regression is the 
detection of relevant variables. Identifying signihcant sets of genes which are associated 
with certain clinical outcome is very important in genomic studies, see Subramanian et ah 
(2005), Efron and Tibshirani (2007) and Newton et ah (2007). The main challenge of high¬ 
dimensional data is that the dimension p is much larger than the sample sizes n. When this 
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happens, many traditional statistical methods and theories may not necessarily work since 
they assume that p keeps unchanged as n increases. Recently, many efforts have been devoted 
to solve this problem. One is the variable selection method. Fan and Lv (2008) proposed 
the Sure Independence Screening (SIS) method based on a correlation learning to reduce 
the dimensionality from high to a moderate scale that is below sample size. Wang (2009) 
extended the classic Forward Regression method under an ultra-high dimensional setup. The 
other method is hypothesis testing. To gain power and insight, it can be advantageous to 
look for influence not at the level of individual variables but rather at the level of clusters 
of variables. Thus, A simultaneous test on linear regression coefficients in high-dimensional 
settings is needed. Goeman, Finos and van Houwelingen (2011) formulated an Empirical 
Bayes test via a score test on the hyper parameter of a prior distribution assumed on the 
regression coefficients. Zhong and Chen (2011) modihed the classic F-statistic and proposed 
a ^/-statistic to examine the validity of the full model and extended their test to a linear 
model augmented with the factorial design setting. 

However, both these two tests are not scalar invariant. Intuitively speaking, their test 
power would heavily depend on the underlying variance magnitudes since they do not use the 
information from the diagonal elements of the sample covariance, i.e., the variances of each 
variables. When all the components are (approximately) homogeneous , they would be very 
powerful, whereas their superiority would be highly affected if the component variances differ 
much. In practice, different components may have completely different physical or biological 
readings and thus certainly their scales would not be the same. Hence, it is desirable to 
develop scalar-transformation-invariant test procedure which are able to integrate all the 
individual information in a relatively “fair” way. In practice, due to confidentiality reasons, 
both the response and predictors will be firstly standardized to be zero mean and unit 
variance usually. When the dimension of predictors is low, the test efficiency is not impacted 
by this standardized procedure. However, when the dimension of predictors is ultra-high, 
there would be a large bias in the test procedure because the variance estimators are only 
root-n consistent, see Feng et ah (2012) for the case in the high-dimensional two sample 
Behrens-Fisher problem. Thus, if we standardize the predictors firstly, Zhong and Chen 
(2011)’s test will not be reasonable when the dimension p is ultra-high. This motivates us to 
discuss when the asymptotic normality of their test statistic still holds after standardizing the 
predictors. Thus, in this article, we proposed a novel test statistic which is scalar-invariant 
and provide the theoretical conditions when its asymptotic normality still holds. Simulation 
studies show that our proposed test has reasonable sizes and effective powers. 

The remainder of the paper is organized as follows. In the next section, we propose our 
test statistic and establish its asymptotic normality. Simulation comparison is conducted in 
Section 3. All technical details are provided in the Appendix. 
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2 Test Statistics 


In this article, we consider the following linear regression model 

EiY,\y.i) = a + X;/3, var(y;|X,) = ( 1 ) 

for i = 1, • • • , n where Xi, • • ■ , X„ are independent and identically distributed p-dimensional 
covariates and Ih, • • ■ , are independent responses, (3 is the vector of regression coefficients, 
and q; is a nuisance intercept. To make (3 identifiable, we assume that S = var(Xj) and 
R = cor (X i) is positive dehnite. Our interest is in testing a high-dimensional hypothesis 

Hq: f3 = /3 q vs Hi ; /3 7 ^ /3o (2) 

for a specihc /3 q G MP. A classical method to deal with this problem is the famous F-test 
statistic 

0 - /3o)'A'(A(U'U)-^A')-i0 - /3,)/p 

” Y'(In-U(U'U)-iU')Y/(n-p-1) 

where U = (1,X)', A = (0, Ip) and 0 is the least square estimator of /3. Its advantages 
include: it is invariant under linear transformation, its exact distribution is known under the 
null hypothesis and it is powerful when the dimension of data is sufficiently small, compared 
with the sample sizes. However, Zhong and Chen (2011) showed that the power of F-test is 
adversely impacted by an increased dimension even p < n — 1, reflecting a reduced degree of 
freedom in estimating when the dimensionality is close to the sample size. Moreover, the 
F-test statistics is undehned when the dimension of data is greater than the within sample 
degrees of freedom since the pooled sample covariance matrices are not positive dehnite. 
In order to overcome this issue, Goeman, Finos and van Houwelingen (2011) proposed an 
Empirical Bayes test, which is formulated via a score test on the hyper parameter of a prior 
distribution assumed on the regression coefficients. Their test statistics is 

(Y - g - X'/3„)'XX'(Y -a- X'/3„) 

” n(Y-g-X'/3„)'(Y-g-X'/3„) ' ’ 

where a is the sample mean of Y. The key feature of their method is to use Euclidian norm 
to replace the Mahalanobis norm since having (X X)”^ is no longer benehcial when p is 
larger than n. However, the power of is adversely impacted by n, the mean of X, which 
is a nuisance parameter in our interested test. Zhong and Chen (2011) consider a ^/-statistic 

- Xg)'(X., - X..)(A.. - Ag)(A., - Ag) (4) 

where A* = Yj — 'X.if3Q. Through this article, we use to denote summations over distinct 
indexes. For example, in Z^, the summation is over the set {A 7 ^ *2 7 ^ is 7 ^ A}, for all 
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ii,i 2 ,i 3 ,H £ {I;''' and P™ = ■ Obviously, Zn is not impacted by the nuisance 

parameter a and fi. They established the asymptotic normality of Zn under the diverging 
factor model ( Bai and Saranadasa 1996). 

However, an obvious limitation of Gn and Zn is that they are not invariant under scalar 
transformations. To this end, we standardize each component of (Xj^ — Xjj) (Xjg — Xj^) in 
Zn by the corresponding variance and propose a simple but effective test statistics, 

- X.,)'Dy(X,, - X..)(A., - A.,)(A., - A,J (5) 

where D 5 is the diagonal matrix of pooled sample covariance matrix, that is 

Ds = diag(d2, • • • , dj) 

where al is the sample variance of /c = I,-- - ,p. Obviously, is invariant to 

location shifts in both X* and Hj. Thus, we assume, without loss of generality, that a = /x = 0 
in the rest of the article. Moreover, is invariant under the group of scalar transformations, 
say, Xj —)■ CXj for i = 1, • • • , n where C = diag{ci, • • • , Cp} and ci, • • • ,Cp are non-zero 
constants. 

In order to establish the asymptotic normality of Tn, we assume, like Bai and Saranadasa 
(1996), the following diverging factor model: 

Xi = T7.i + II 

where F is a p x m matrix for some m > p such that FF' = S and are m-variate 

independent and identically distributed random vectors such that 

P(zi) = 0, var(zi) = I^, P(4) = 3 + A, P(4) = mg G (0, cx)), 

• • • 4;) = E{zZ)E{zZ) • • ■ i?(4,), ^ ^ 

whenever A 8 and ki ^ k 2 - ■ ■ ^ kq. Additional, we need the following conditions 

to regulate for the “ large p, small n” is, 

(Cl) p{n) —)■ CX) as n —)■ 00 ; 

(C2) tr(R4) = o(tr2(R2)); 

(^3) „ 2 tf(R 2 ) 0 . 

Remark 1 Both Condition (Cl) and (C2) are similar to condition (2.8) in Zhong and Chen 
(2011). Since the estimator is only root-n consistent, there would be a little bias term 
in the variance of T„. Fortunately, the bias term would be negligible when condition (C3) 
holds. To appreciate condition (C3), consider the simple case R = Ip, thus, the condition 
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becomes p = o(n^). When p gets larger, such as p = 0{n‘^), the bias term in the variance of 
Tn will no longer be negligible. Thus, we need a bias correction to solve this problem, see 
Feng et ah (2012) for more information. 

In order to study the asymptotic power of our test, similar to Zhong and Chen (2011), 
we dehne the following local alternatives 

(/3-/3o)'S(/3-/3o) = o(l) 

(/3 - /3o)'SD-iSD-^S(/3 - f3,) = o(n-Hr(R2)) (7) 

Note that the local alternatives ([7]) prescribe a smaller difference between (3 and /3 q. Similar 
to Zhong and Chen (2011), we also consider two different hxed alternatives which violate the 
hrst part of ([7]) in the Appendix. And we also demonstrate our proposed test can achieve at 
least 50% power under these two hxed alternatives. 

The following Theorem establishes the asymptotic normality of T„ under the null or local 
alternative ([7]) hypothesis. 

Theorem 1 Assume conditions (Cl)-(C3) hold, then under either Ho or the local alternative 
as n ^ oo, 

, {T„ - ||D-'-'=E(/3 - /3„)||=) 4]V(0,1) (8) 

2tr(R"') 

To formulate a test procedure based on T„, we need to estimate tr(R^) and cx^ appeared 
in the asymptotic variance. In order to reduce the computational work, we propose the 
following ratio consistent estimator of tr(R^), 

MR?) = - X.J'Dy(X., - XiJ(Xi, - X„)'Dy(Xi. - Xj.) 

And the estimator of under Hq is 

1 ” 
i=l 

Proposition 1 Suppose the conditions in Theorem 1 hold. Then, as n,p ^ oo 

p 

tr(R2) 

Apply Theorem 1 and the Slutsky Theorem, the proposed test rejects Hq at a signihcant 
level a if 

nTn > \J2tT{lC)a^Za (9) 
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where is the upper-a quantile of iV(0,1). 


Next, we discuss the power properties of the proposed test. According to Theorem 1, 
the power of our proposed test under the local alternative ([7]) is 


/^r„(||/3 - /3oll) - + 


n\ 


|d-v2e(/3 - A 


y 2 tr(R 2 )CT 2 

where is the standard normal distribution function. In comparison, Zhong and Chen 
(2011) show the power of their proposed test is 


/^z„(ll/3-/3oll) = + 


n||S(/3-/3o)|| 

v/ 2 tr(S")a 2 


-) 


Note that it is difficult to compare the proposed test with Zhong and Chen’s (2011) test under 
general settings. Thus, in order to get a rough picture of the asymptotic power comparison 
between these two test, we consider the following representative cases: 


(i) The variances of all variables are equal to A and then S 

/Sz. ( 11 /3 -/3 0 11 ) =( 11 /3 -/3 0 11 ) = ^ (- ^« + 


= AR. In this case, 

nA||R(/3-/3o)|p 

y^2tr(R7)(j2 


(ii) S(/3 — /3 q) = h(l, 1, • • • , 1)7 In this case, 

/^t„(||/3-/3oII) = ^{-Za + 


/?Zn(ll/3-/3oll) = ^{-Za + 


ntr(D ^)5^ 
V 2 tr(R 2 )a 2 
np6‘^ 

V 2 tr(S")a 2 


) 


According to the Cauchy inequality, 

tr2(D-^)tr(S2) > p2tr(R2) 

As a consequence, 

- m > PzA\\/3 -/3o\ 


When the variances of all the variables are equal, the two tests are equivalently pow¬ 
erful. Otherwise, the proposed test would be more preferable in this case. 


(iii) S is a diagonal matrix i.e. S = D. The variances of the hrst half components are af 
and the rest are all Assume (3i — /3oi = 6, i = 1, ■ ■ ■ , [|J and the others are all equal 
to zero. In this setting. 


/^tJ||/3-/3oII) = ^{-Za + 
f^zA\\l3-l3o\\) = H-Za + 


n^al6^ 

2V2a^ 

n^af6^ 

2 -yo^+^(T 2 
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Thus, the asymptotic relative efficiency (ARE) of the porposed test with respect to 
the Zhong and Chen’s (2011) test would be It is clear that the 

proposed test is more powerful than Zhong and Chen’s (2011) test if af < and vice 
versa. This ARE has a positive lower bound of l/\/2 when af >> af, whereas it can 
be arbitrarily large if (y\la\ is close to zero. 


3 Simulation 

Here we report a simulation study designed to evaluate the performance of our proposed 
test (abbreviated as SF). For comparison purposes, we also conducted the test proposed 
by Zhong and Chen (2011) (abbreviated as ZC) and the Empirical Bayes test proposed by 
Goeman, Fino, and van Houwelingen (2011) (abbreviated as EB). We consider the following 
linear regression as Zhong and Chen (2011): 


and the hypotheses to be tested are 

f, = x;/3+£, 

(10) 

77o:/3 = 

Opxi vs Hx . /3 Opxi 

(11) 


We consider two distributions for e*, one is A^(0,4), the other is centralized gamma distri¬ 
bution Gamma(l, 0.5). And Xj = (Xji, • • • ,A'jp) are generated according to the following 
moving average model 

ATij = P\Zij -|- P2^i(i+l) + ■ • ■ + PTZi(^j+T-l) + pij 

for j = 1, • • • ,p and T < p. Here {Zij}^'^'^~^ are, respectively, i.i.d. random variables. We 
consider two scenarios for the innovation Zj^: (Scenario I) all the {Z^} are from A^(0,1); 
(Scenario H) the first half components of {Zij}^^i~^ are from A^(0,1), and the rest half 
components are from centralized Gamma(4,1). The coefficient {pi}f^-^ were generated inde¬ 
pendently from 17(0,1) and were kept hxe once generated through our simulations. And the 
means are also fixed constants generate from 17(2, 3). We chose T = 10 and T = 20, 

to generate different covariances of X*. Similar to Zhong and Chen (2011), we consider two 
conhgurations of the alternative hypothesis Hi. One is “nonsparse case”, which allocated 
Erst half of the /3-components of equal magnitude to be nonzeros. The other is “sparse 
case”, which has only the first five nonzero components of equal magnitude. In both case, 
we fixed ||/3|p at three levels: 0.03,0.06,0.09. Here we only consider the case p > n and 
chose (n,p) = (30,100), (40, 200), (50,400). 

Table 1-2 and Table 3-4 reports the empirical sizes and powers with normally and 
centralized gamma distributed residuals, respectively. From Table 1 to Table 4, we observe 
that the empirical sizes are both reasonable for these three tests. And the the sizes of these 
two tests became closer to the nominal level 0.05 when n and p gets larger, which is similar 
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to Zhong and Chen (2011)’s results. Moreover, from Table 1 and 3, we observe that when all 
the variances of components are equal (Scenario 1), our proposed SF test performs similar to 
ZC tests and EB tests. Even though we need to estimate the variance of each component, 
our proposed SE test does not lose much information form the samples when the dimension 
p is a small order of . These Endings are also consistent with the asymptotic intuition 
in Section 2. However, when the variances of each components are not equal (Scenario 11), 
our proposed SE test is clearly much more powerful then the other two tests. This mainly 
due to the fact that ZC tests and EB tests are not scale-invariant. When the variances 
of variables are not equal, ZC tests and EB tests hardly capture the coefficient shifts with 
smaller variances and then it will be powerless in such cases. Thus, it is not strange that 
their performance are extremely poor in such cases. 


Table 1: Empirical size and power comparisons at 5% significance for normal residual under 
Scenario 1 





T=10 



T=20 


(n,p) 

m? 

SE 

ZC 

EB 

SF 

ZC 

EB 

(a) nonsparse case 

(30,100) 

0.00 

0.06 

0.06 

0.05 

0.06 

0.06 

0.06 


0.03 

0.27 

0.31 

0.27 

0.75 

0.71 

0.76 


0.06 

0.53 

0.54 

0.46 

0.93 

0.91 

0.95 


0.09 

0.69 

0.70 

0.64 

0.97 

0.96 

0.98 

(40,200) 

0.00 

0.05 

0.05 

0.04 

0.05 

0.05 

0.05 


0.03 

0.25 

0.25 

0.37 

0.78 

0.77 

0.82 


0.06 

0.47 

0.49 

0.60 

0.95 

0.95 

0.96 


0.09 

0.63 

0.69 

0.76 

0.98 

0.97 

1.00 

(50,400) 

0.00 

0.05 

0.05 

0.06 

0.05 

0.05 

0.04 


0.03 

0.22 

0.22 

0.14 

0.81 

0.79 

0.79 


0.06 

0.45 

0.42 

0.33 

0.94 

0.95 

1.00 


0.09 

0.61 

0.60 

0.43 

0.97 

0.97 

1.00 

(b) sparse case 

(30,100) 

0.03 

0.12 

0.13 

0.07 

0.15 

0.15 

0.16 


0.06 

0.20 

0.19 

0.12 

0.24 

0.24 

0.26 


0.09 

0.24 

0.23 

0.15 

0.31 

0.30 

0.35 

(40,200) 

0.03 

0.11 

0.12 

0.14 

0.16 

0.14 

0.17 


0.06 

0.18 

0.18 

0.22 

0.25 

0.26 

0.32 


0.09 

0.24 

0.24 

0.32 

0.40 

0.39 

0.48 

(50,400) 

0.03 

0.10 

0.10 

0.11 

0.12 

0.12 

0.15 


0.06 

0.15 

0.15 

0.17 

0.25 

0.25 

0.22 


0.09 

0.18 

0.21 

0.28 

0.34 

0.33 

0.33 
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Table 2: Empirical size and power comparisons at 5% significance for normal residual under 
Scenario II 





T=10 



T=20 


(n,p) 

m? 

SF 

ZC 

EB 

SF 

ZC 

EB 

(a) nonsparse case 

(30,100) 

0.00 

0.06 

0.06 

0.06 

0.05 

0.07 

0.06 


0.03 

0.29 

0.10 

0.12 

0.70 

0.28 

0.34 


0.06 

0.48 

0.17 

0.18 

0.92 

0.50 

0.53 


0.09 

0.63 

0.23 

0.27 

0.96 

0.59 

0.59 

(40,200) 

0.00 

0.05 

0.05 

0.06 

0.05 

0.05 

0.04 


0.03 

0.35 

0.15 

0.10 

0.77 

0.31 

0.32 


0.06 

0.55 

0.19 

0.12 

0.95 

0.48 

0.51 


0.09 

0.65 

0.23 

0.14 

0.97 

0.59 

0.63 

(50,400) 

0.00 

0.05 

0.05 

0.05 

0.06 

0.06 

0.05 


0.03 

0.23 

0.10 

0.07 

0.80 

0.34 

0.33 


0.06 

0.45 

0.15 

0.10 

0.95 

0.48 

0.49 


0.09 

0.60 

0.17 

0.14 

0.99 

0.55 

0.60 

(b) sparse case 

(30,100) 

0.03 

0.09 

0.06 

0.05 

0.16 

0.07 

0.06 


0.06 

0.18 

0.07 

0.07 

0.28 

0.10 

0.11 


0.09 

0.23 

0.09 

0.07 

0.40 

0.11 

0.12 

(40,200) 

0.03 

0.13 

0.13 

0.07 

0.16 

0.09 

0.08 


0.06 

0.16 

0.14 

0.09 

0.28 

0.11 

0.10 


0.09 

0.24 

0.14 

0.11 

0.35 

0.13 

0.12 

(50,400) 

0.03 

0.07 

0.05 

0.07 

0.15 

0.11 

0.06 


0.06 

0.14 

0.05 

0.09 

0.25 

0.12 

0.09 


0.09 

0.18 

0.06 

0.09 

0.33 

0.14 

0.12 


4 Appendix 

4.1 Proof of Theorem 1 

Define D the diagonal matrix of covariance matrix, that is 

D = diag(a^,-- - ,aj). 
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Table 3: Empirical size and power comparisons at 5% significance for centralized gamma 
residual under Scenario I 





T=10 



T=20 


(n,p) 

\m^ 

SF 

ZC 

EB 

SF 

ZC 

EB 

(a) nonsparse case 

(30,100) 

0.00 

0.05 

0.06 

0.06 

0.06 

0.06 

0.06 


0.03 

0.33 

0.31 

0.25 

0.79 

0.76 

0.84 


0.06 

0.56 

0.58 

0.43 

0.90 

0.88 

0.95 


0.09 

0.70 

0.72 

0.58 

0.96 

0.95 

0.98 

(40,200) 

0.00 

0.05 

0.06 

0.06 

0.06 

0.06 

0.03 


0.03 

0.29 

0.31 

0.30 

0.83 

0.82 

0.96 


0.06 

0.48 

0.48 

0.53 

0.95 

0.94 

0.99 


0.09 

0.65 

0.64 

0.71 

0.98 

0.98 

1.00 

(50,400) 

0.00 

0.05 

0.05 

0.07 

0.05 

0.05 

0.06 


0.03 

0.29 

0.28 

0.39 

0.80 

0.80 

0.85 


0.06 

0.53 

0.53 

0.63 

0.95 

0.94 

0.98 


0.09 

0.66 

0.65 

0.78 

0.98 

0.98 

0.99 

(b) sparse case 

(30,100) 

0.03 

0.11 

0.14 

0.15 

0.20 

0.20 

0.17 


0.06 

0.18 

0.19 

0.22 

0.32 

0.32 

0.30 


0.09 

0.25 

0.27 

0.32 

0.40 

0.42 

0.41 

(40,200) 

0.03 

0.12 

0.12 

0.10 

0.21 

0.21 

0.26 


0.06 

0.19 

0.18 

0.17 

0.32 

0.31 

0.46 


0.09 

0.24 

0.23 

0.19 

0.39 

0.39 

0.57 

(50,400) 

0.03 

0.11 

0.10 

0.10 

0.15 

0.14 

0.14 


0.06 

0.15 

0.15 

0.20 

0.25 

0.23 

0.28 


0.09 

0.21 

0.20 

0.24 

0.36 

0.35 

0.41 


Thus, we can rewrite as follow 

- X„)'D-'(X., - X,J(A,. - A.,)(A., - A.J 
+ - Xfa)'(Ds‘ - D-')(Xj, - X„)(A„ - A.J(A., - A.J 

=Tnl + Tn2 


Define 


* 2 , is, d ) 


l(x„ - X„)'D-'(X., - X.J(A,, - A.J(A., - A„) 
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Table 4: Empirical size and power comparisons at 5% significance for centralized gamma 
residual under Scenario II 





T=10 



T=20 


(n,p) 

\m^ 

SF 

ZC 

EB 

SF 

ZC 

EB 

(a) nonsparse case 

(30,100) 

0.00 

0.06 

0.06 

0.05 

0.06 

0.06 

0.05 


0.03 

0.38 

0.13 

0.12 

0.76 

0.38 

0.35 


0.06 

0.56 

0.17 

0.17 

0.93 

0.57 

0.55 


0.09 

0.70 

0.25 

0.21 

0.98 

0.66 

0.66 

(40,200) 

0.00 

0.06 

0.05 

0.05 

0.05 

0.06 

0.05 


0.03 

0.30 

0.11 

0.12 

0.75 

0.28 

0.25 


0.06 

0.49 

0.18 

0.18 

0.97 

0.47 

0.38 


0.09 

0.63 

0.23 

0.23 

0.99 

0.61 

0.50 

(50,400) 

0.00 

0.06 

0.06 

0.07 

0.06 

0.06 

0.05 


0.03 

0.30 

0.12 

0.10 

0.79 

0.34 

0.46 


0.06 

0.49 

0.16 

0.14 

0.96 

0.45 

0.64 


0.09 

0.62 

0.19 

0.17 

0.99 

0.53 

0.71 

(b) sparse case 

(30,100) 

0.03 

0.10 

0.09 

0.06 

0.19 

0.09 

0.08 


0.06 

0.20 

0.09 

0.08 

0.30 

0.10 

0.10 


0.09 

0.27 

0.09 

0.08 

0.40 

0.14 

0.12 

(40,200) 

0.03 

0.12 

0.10 

0.06 

0.17 

0.08 

0.06 


0.06 

0.20 

0.10 

0.08 

0.32 

0.10 

0.09 


0.09 

0.25 

0.11 

0.09 

0.43 

0.13 

0.10 

(50,400) 

0.03 

0.16 

0.08 

0.05 

0.15 

0.11 

0.07 


0.06 

0.22 

0.10 

0.07 

0.25 

0.10 

0.09 


0.09 

0.28 

0.10 

0.08 

0.37 

0.12 

0.11 


And then we symmetrize 0 by 


h{W,,W„Wk,Wi) 




where Wi = (X^, Si)' and Ei = Yi — X^/3. Thus 


Tni — 


n A 
h^n,4 
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Define 6^ = (i — /3q. After some tedious calculation, we can obtain the projections of h are, 
respectively, 

hi{W^) =^(5;3(XiX; + S)D-iS(5^ + 1 £iX;D-1S5^ 
h 2 (fhi, W 2 ) =^{^p(Xi - X 2 )(Xi - X2)'D-1S5^ + (£1 - £2 )(Xi - X2)'D-1S5^ 

+ (<5;3(Xix; + s) + £ix;) d-^ ((X2x; + + £ 2 X 2 )} 

hiWu ID 2 , ID 3 ) ((Xi - X 2 )' 5 ^ + (£1 - £ 2 )) D-i(Xi - X 2 )' ((XgX; + + £ 3 X 3 ) 

+ ^ ((Xi - Xs )' Sf , + (£1 - £3)) D-'(Xi - X3)' ((X2X; + + £2X2) 

+ ^ ((X2 - X 3 )' 5 ^ + (£2 - £3)) D-1(X2 - X3)' ((XiX; + S) 5 ^ + £1X1) 

Define Bi = <5^X5^, B 2 = B^ = and Aq = T'D, Ai = 

r' 5 ^ 5 ; 3 r, A 2 = r'DS 5 ; 3 ( 5 ^SD~ir, A 3 = r'Rr. Then, 

var(hi) [{Bi + + -82 + Atr(Ai o A 2 )} 

var(h2) = —|(J^tr(R^) + 2IR2 + 22R1R3 + 22 a^B^ + i?^tr(R^) + 2 a\i{B?‘)Bi 
36 I 

+ 2A(Ri + cT^)tr(Ai o A 3 ) + 20Atr(Ai o A 2 ) + AV[(Aodiag(Ai))^]| 
var(h) =^|l 2 cT'^tr(R^) + 45 i ?2 + 65 R 1 R 3 + 40ct^R3 + 10 i?itr(R^) + 24(T^tr(R^)Ri 
+ 12A(Ri + (T^)tr(Ai o A 3 ) + 37Atr(Ai o A 2 ) + 4A^tr[(Aodiag(Ai))^]| 

Thus, var(h 2 ) and var(h) are of the same order. Next, taking the same procedure as Zhong 
and Chen (2011), under the condition ([7]), we can show that 

r„i = ^gig,X;X^- + Op(Vvar(T„i)) 

1 ) i<j 

And then, similar to Zhong and Chen (2011), we can easily obtain that 

-j^|= (T,,. - ||D-‘/=S(/3 - /3„)||=) 4 N(a, 1) (12) 

by applying the martingale central limit theorem (Hall and Heyde 1980). 
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In order to proof Theorem 1, we only need to show that Tn^ = o(^a^A/2tr(R?)). 

2 * p 

Tn2 — ^ ^ ~ Xi2k){Xi3k ~ Xi4,k){^ii ~ ^^ 12 ) (^*3 ~ ^u)i.^k ~ '^k ) 

^ k=l 

* p 

=i^EE( ^ilk 2^i2fc)(^i3fc ^* 2 ) (^*3 ^k ) 

^ k=l 

* p 

=i^EE( ^ilk 2^i2fc)(^i3fc '^*2)(^*3 ^*4)(^fc ^k)^k 

^ k=l 

^ * P 

T 4p4 ^ ^ ^ ' ^ X ^hk ~ Xi^k){,Xi^k ~ Xi^k){,^ii ~ ^* 2 ) (^*3 ~ ^*4)(1 ~ ^k^k ) ^k 
^ k=l 

=Ai + yl2 
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Firstly, we will show that E{Aj) = o(^(T"^tr(R^)). 

ml) 


p p 


iGiP^y 


E (^2i/c ^i 2 k)i.^i 3 k ^22) (^^3 ^24) ^k^^k 


k=l 1=1 '^1>^25^35^4 

* 

X Y1 - mm - mm - mm - mi^^f - 


-4 


*5,*6j*7,*8 


P P 


E E ^ E (A, - A,J (A ,3 - A,J (A,, - A, 3 ) (A ,3 - A,, ^ 


16(P4)2 

k=i i=i 

X (.^iik ^i2k)(^^i3k ^i4k){^iil ^*50(^*3^ ^iel) 


*l,*2,*3i*4,*5,*6 


X 


n 




2=1 


n{n — 1) 


J2 ^ikXjk I ^'il) + 


P P 


n n 


i=l 


nin — 1) ^ 








* 1 ,* 2 ,*3,* 4 ,*5,*6 *=1 i = l 


X C^z CT;;, (xzifc - Xi^k){Xi^k - xm^hi - Ximhi - Xi^k - Xik){(^l - Xji) 
1 


p p 


+ 


16n2(n-l)(P4)2^^ 


EE^ E E E(^- --a.j(a ,3 -a, 3 ^ 


17 = 1 


X (-^^3 (Tj^ i^^iik ^i 2 k^i_^i 3 k ^ 24 /c)(^ 2 i/ ^25/) (^23/ ^iQl^i.^k 

1 


p p 


16n2(n- 1)(P4)2 


EE^ E E E(^*i --A.J(A ,3 -A, 3 ^ 


* 1 .* 2 ,* 3 1*4,751*6 *7 = 1 *8^*9 


X (Ajg Ajg)(J^ (J^ {Xi-^k Xi2k){Xi^k Xi^k){Xi4l Xi^l){Xi^l Xj^^^jXi^kXigk 

1 


P P 


EE^ E E E(^*i-^*^)(^* 3 -Am)(a.,-a, 3 ^ 


16n?(n — 1)‘^(P^)‘^ . 

k=l 1=1 ^l5^2,^35^45^5?^6 ^ 9/^10 

X (A^3 A^g)(T^ CTj^ i^^iik ^22/il) (^23/i; ^24/c)(^2l/ ^25/) (^23/ ^26/)^27/c^28/i;^29/^2io/ 

=An + Ai2 + Ai3 + Au 
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After some tedious calculation, under condition ([7]), we can obtain that 


^11 “ E{x%Xi{)){akia‘f - E^XikxD) 


+ 0{^) {(Ti ^a^^akiiaualaf - a‘fE{x%Xii) - alE{xikxl) + E{x%xl))) + o(^ctV(R2)) 

where S = (aij)ij=i,... ,p. Dehne F = (n^), according to the multivariate model, we can show 
that 


E{x%Xii) ( X] 




. 2=1 


vi=i 


m m m m 


E (EEEi: 

^ 2=1 j=l s=l t=l 


<3 + ^) + 3 vliVkjVij = A ^ vliVii + 3alaki 

i¥=j 


i=l 


2=1 


<A, 


m m 


^ i=l i=l 

=Aalai + Salttki 
Dehne E{zf) = d' < +cxo, 

3 


+ 3t^fcafei < A, 




ki 


+3cr^afc. 


. 2 = 1 


b(44) =-b E 


'^ki^i 




. 2=1 


0 = 1 


m m m m m m 


2=1 j=l 5=1 t=l r=l 'W=l 


m m m m m m 


=EEEEEE ki^ kj"^ kr"^ It^ wl^ j 

2=1 j=l s=l t=l r=l w=l 

mm mm 

+ ^27 + 9A) Avkjvlvij + (27 + 9A) X] vl^vuvl + 9 ^ vl^v^VksVis 

i=l i^j i^j 

2 2 / 2 2 ^ 27+9A^^ 2 2 / 2 2 n 

<17 X^ VkiM^ki + %) + -7- X^ X^ VkiViiiVkj + Vij) 


2 = 1 


i=i j=i 


m m m 


27 + 9A \ \ 2 2 / 2 2\ 9 V V V 2 2 / 2 2 \ 

+-^-X^ X^ VkiVijiVki + %) + 2 X^ X^ X^ '>^ki'Vij{Vk, + Vi,) 

i=l j=l i=l j=l s=l 

+ 63 + 18A) (cT^af + afal) 

Thus, we obtain that An = 0{^) + o(^(T'^tr(R^)) = o(^(T"‘tr(R^)) by the condition (C3). 
Taking the same procedure as An, we can show that An, An, A 14 are all ^(T'^tr(R^). Here, 
we obtain the result that E{Al) = o( 22 -(T^tr(R^)). 
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Next, we rewrite A 2 as follows, 


p / 1 * 

A2 = y ^ I .p^ y ~ xi^k){xi^k ~ 

k=i ^ 

p 

= J2CkDk 

k=l 


^* 2 ) (^*3 



( 1 -^fc^fc ' 


2 


By the Cauchy inequality, we obtain that 


E{Al) =E 



< E 




< 



E 



Taking the same procedure as An, we can show that E{ClCf) = 0{n and E{DlDf) = 
0(n“^). Thus, E{Al) = 0{^) = o(^(T"^tr(R^)) by the condition (C3). Here we proof the 
results. 


4.2 Proof of Proposition 1 

Firstly, after some tedious calculation, we can rewrite tr(R2) as follow, 
1 


2Pi 


5^(X„ - X.j'Dj‘(Xfe - Xj,)(Xj3 - X„)'Dy(Xj, - XiJ 


=4Z(x:.Dyx„)2 - T j^x',D5'x„x;Dyx.. + .1 j^x:,Dyx„x;,D^'x„ 

Taking the same procedure as Theorem 1, we can show that 


1 




p2 

n 


E(X;,D-‘X„)" + Op(tr(R")) 


^ Ex;,Dyx„x;,Dyx., = |_^X',D-'X,,x;d-‘X .3 + o,(tr(R2)) 

■^n 


p3 

n 


' EN.Dyx^Xpyx.. = -^^x;,D-'X.,x;,D-‘X„ + o,(tr(R2)) 

-^n 


pi 

n 
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Thus, 



where Xj = D Then, according to Theorem 2 in Chen, Zhang and Zhong (2010), we 

can easily obtain the result. 

4.3 Power Under Fixed Alternative 

In this part, similar to Zhong and Chen (2011), we consider two scenarios of hxed alternatives 
under 


5 pTi 5 i 3 is not o(l) 


One is 



(13) 


If S'^'S 6/3 is truly bounded, ffU]) implies = o (itr(R^)) which mimics the 

second part of ([7]). The other is 




(14) 


If is truly bounded, flT^ implies ^tr(R^) = o (^^SD ^SD which means there 

is a larger discrepancies between (3 and /3 q. 

Theorem 2 Assume the condition (C1)-(C3) hold, then 
(i) under the first fixed alternatives / f73]) . 


(T„-||D-V2s(/3-/3o)in4iV(0,l) 


where 


=2a^tr(R^) + 2BltT{R^) + 4a^tr(R^)Ri 

+ 4A(Ri + (T^)tr(yli o As) + 2A^tr[(4odiag(4i))^] 
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(ii) under the first fixed alternatives ([7^, 


where 


n 


(T„ 




o'a 2 ~ (-^1 + ^2 + Atr(24i o ^ 42 ). 


The proof of Theorem [2] is contained in a longer version of this article. The above theorem 
implies that the asymptotic power of the test nnder the hrst hxed alternative (1T3|1 is 


/9t.(||/3-/3oI 


<h 


V2tr(R2)a2z„ n||D-V2s5^ 


O'Ai 


+ 


O'Ai 


Note that (T^^^'\/2tr(R2)cr^ is always bonnded from inhnity becanse Ri is not o(l) and > 
2R^tr(R^). When Bi —)■ cxo, the hrst term converges to 0 and then our test attains at least 
50% power in this case. Furthermore, if ncr^^||D“^/^S(5/3|p —)■ 00 , the power will converge 
to 1. And the asymptotic power of the test under the hrst hxed alternative ffTT|) is 








Vn - 1 <xa 2 


+ 


n 


|D-V2sh 


/ 3 | 


O'Ao 


Under hxed alternative fll4l) . ;^trR^ = o(a^^), which implies the hrst term converge to 0. And 
then our test attains at least 50% power in this case. Similarly, if nafi^\\D^ -V2s<5^||2 00 , 
our test is consistent. 


References 

Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis. Hoboken, NJ: Wiley. 

Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. 
Statistica Sinica, 6 , 311-29. 

Chen, S. X. and Qin, Y-L. (2010). A two-sample test for high-dimensional data with applications to gene-set 
testing. The Annals of Statistics, 38 , 808-835. 

Chen, S. X., Zhang, L. -X. and Zhong, P. -S. (2010). Tests for high-dimensional covariance matrices. 
Journal of the American Statistical Association, 105 , 810-815. 

Efron, B., and Tibshirani, R. (2007). On testing the significance of sets of genes. The Annals of Applied 
Statistics, 1 , 107-129. 

Fan, J., and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal 
of the Royal Statistical Society, Ser. B, 70 , 849-911. 

Fan, J., Hall, P., and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal students t or 
bootstrap calibrations be applied. Journal of the American Statistical Association, 102 , 1282-1288. 

Feng, L., Zou, C., Wang, Z. and Chen, B. (2013), Rank-based Score Tests for High-Dimensional Regression 
Coefficients, Electronic Journal of Statistics, 7, 2131-2149. 


18 












Feng, L., Zou, C., Wang, Z. and Zhu, L. (2014). Two-sample Behrens-Fisher problem for high-dimensional 
data, Statistica Sinica, To appear. 

Goeman, J., Finos, L., and van Houwelingen, J. C. (2011). Testing against a high dimensional alternative 
in the generalized linear model: asymptotic type 1 error control, Biometrika, 98 , 381-390. 

Goeman, J., Van de Geer, S. A. and Van Houwelingen, J. G. (2006). Testing against a high-dimensional 
alternative. Journal of the Royal Statistical Society, Ser. B, 68 , 477-493. 

Hall, P., and Heyde, G. G. (1980), Martingale Limit Theory and Its Application, New York: Academic 
Press. 

Kosorok, M. R., and Ma, S. (2007). Marginal asymptotics for the “Large p. Small n” paradigm: with 
applications to microarray data. The Annals of Statistics, 35 , 1456-1486. 

Lee, A. J. (1990), U-Statistics: Theory and Practice, Marcel Dekker. 

Meinshausen, N. (2008). Hierarchical testing of variable importance, Biometrika, 95 , 265-278. 

Newton, M., Quintana, F., Den Boon, J., Sengupta, S., and Ahlquist, P. (2007), Random-Set Methods 
Identify Distinct Aspects of the Enrichment Signal in Gene-Set Analysis, The Annals of Applied 
Statistics, 1 , 85-106. 

Portnoy, S. (1984). Asymptotic behavior of the M-Estimators of p-regression parameters when p'^/n is 
large: consistency, The Annals of Statistics, 12 , 1298-1309. 

Portnoy, S. (1985). Asymptotic behavior of the M-Estimators of p-regression parameters when p'^/n is 
large: normal approximation. The Annals of Statistics, 13 , 1403-1417. 

Schott, J. R. (2005). Testing for complete independence in high dimensions. Biometrika 92 , 951-956. 

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., 
Pomeroy, S. L., Golub, T. R., Lander, E. S., and Mesirov, J. P. (2005), Gene Set Enrichment Analysis: 
A Knowledge- Based Approach for Interpreting Genome-Wide Expression Profiles, Proceedings of the 
National Academy of Sciences, 102 , 15545-15550. 

Wang, H. (2009). Eorward regression for ultra-high dimensional variable screening. Journal of the American 
Statistical Association. 104 , 1512-1524. 

Zhong, P. S. and Chen, S. X. (2011). Tests for high dimensional regression coefficients with factorial designs. 
Journal of the American Statistical Association, 106 , 260-274. 


19 



